Add 'crm sbd' sub-level (jsc#PED-8256) #1491

liangxin1300 · 2024-07-17T01:11:36Z

Motivation

The main configurations for sbd use cases are scattered among sysconfig,
on-disk meta data, CIB, and even could be related to other OS components
eg. coredump, SCSI, multipath.

It's desirable to reduce the management complexity among them and to
streamline the workflow for the main use case scenarios.

Changed include

Disk-based SBD scenarios

Show usage when syntax error
Completion
Display SBD related configuration (UC4 in PED-8256)
Change the on-disk meta data of the existing sbd disks (UC2.1 in PED-8256)
Add a sbd disk with the existing sbd configuration (UC2.2 in PED-8256)
Remove a sbd disk (UC2.3 in PED-8256)
Remove sbd from cluster
Replace the storage for a sbd disk (UC2.4 in PED-8256)]
display status (focusing on the runtime information only) (UC5 in PED-8256)

Disk-less SBD scenarios

Show usage when syntax error (diskless)
completion (diskless)
Display SBD related configuration (UC4 in PED-8256, diskless)
Manipulate the basic diskless sbd configuration (UC3.1 in PED-8256)

liangxin1300 · 2024-07-18T02:18:10Z

Disk-based SBD scenarios
1. Show usage when syntax error
2. Completion
3. Display SBD related configuration (UC4 in PED-8256)
4. Change the on-disk meta data of the existing sbd disks (UC2.1 in PED-8256)
5. Add a sbd disk with the existing sbd configuration (UC2.2 in PED-8256)
6. Remove a sbd disk (UC2.3 in PED-8256)
7. Purge sbd from cluster
8. Replace the storage for a sbd disk (UC2.4 in PED-8256)
9. display status (focusing on the runtime information only) (UC5 in PED-8256)
10. Overwrite case
Disk-less SBD scenarios
1. Show usage when syntax error (diskless)
2. completion (diskless)
3. Display SBD related configuration (UC4 in PED-8256, diskless)
4. Manipulate the basic diskless sbd configuration (UC3.1 in PED-8256)
5. Remove diskless sbd from cluster

Disk-based SBD scenarios

1. Show usage when syntax error

# crm sbd configure xx
ERROR: Invalid argument: xx
Usage:
crm sbd configure show [disk_metadata|sysconfig|property]
crm sbd configure [watchdog-timeout=<integer>] [allocate-timeout=<integer>] [loop-timeout=<integer>] [msgwait-timeout=<integer>] [watchdog-device=<device>]

More syntax errror cases
See https://github.com/liangxin1300/crmsh/blob/20240614_crm_sbd_sublevel/test/features/sbd_ui.feature

2. Completion

# crm sbd 
cd          configure   device      disable     help        ls          quit        status      up 

# crm sbd configure 
allocate-timeout=  msgwait-timeout=   watchdog-device=   
loop-timeout=      show               watchdog-timeout=  

# crm sbd configure show 
disk_metadata   property        sysconfig

3. Display SBD related configuration (UC4 in PED-8256)

# crm sbd configure show disk_metadata 
INFO: crm sbd configure show disk_metadata
==Dumping header on disk /dev/sda5
Header version     : 2.1
UUID               : a4f4d842-278c-485d-ada6-7781d88bd632
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 15
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 30
==Header on disk /dev/sda5 is dumped

# crm sbd configure show sysconfig 
INFO: crm sbd configure show sysconfig
SBD_PACEMAKER=yes
SBD_STARTMODE=always
SBD_DELAY_START=71
SBD_WATCHDOG_DEV=/dev/watchdog0
SBD_WATCHDOG_TIMEOUT=15
SBD_TIMEOUT_ACTION=flush,reboot
SBD_MOVE_TO_ROOT_CGROUP=auto
SBD_SYNC_RESOURCE_STARTUP=yes
SBD_OPTS=
SBD_DEVICE=/dev/sda5

# crm sbd configure show property 
INFO: crm sbd configure show property
pcmk_delay_max=30s
have-watchdog=true
stonith-enabled=true
stonith-timeout=83
priority-fencing-delay=60

INFO: systemctl show -p TimeoutStartUSec sbd.service --value
TimeoutStartUSec=90

4. Change the on-disk meta data of the existing sbd disks (UC2.1 in PED-8256)

# crm sbd configure watchdog-timeout=30
WARNING: It's recommended to set msgwait timeout >= 2*watchdog timeout
INFO: Initializing SBD device /dev/sda5
INFO: Update SBD_WATCHDOG_TIMEOUT in /etc/sysconfig/sbd: 30
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: Resource is running, need to restart cluster service manually on each node
WARNING: "priority-fencing-delay" in crm_config is set to 60, it was 0

# crm sbd configure msgwait-timeout=60
INFO: Initializing SBD device /dev/sda5
WARNING: Resource is running, need to restart cluster service manually on each node
INFO: Update SBD_DELAY_START in /etc/sysconfig/sbd: 101
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: "stonith-timeout" in crm_config is set to 119, it was 83

# crm sbd configure show disk_metadata 
==Dumping header on disk /dev/sda5
Header version     : 2.1
UUID               : 15b0a922-ab1b-4abd-b1d1-ab712a12a1ec
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 30
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 60
==Header on disk /dev/sda5 is dumped

# crm sbd configure watchdog-timeout=15
INFO: Initializing SBD device /dev/sda5
INFO: Update SBD_WATCHDOG_TIMEOUT in /etc/sysconfig/sbd: 15
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: Resource is running, need to restart cluster service manually on each node

# crm sbd configure show disk_metadata 
==Dumping header on disk /dev/sda5
Header version     : 2.1
UUID               : d3918be3-8f51-4ca8-aea7-2d3dabdb7fa2
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 15
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 60
==Header on disk /dev/sda5 is dumped

5. Add a sbd disk with the existing sbd configuration (UC2.2 in PED-8256)

# crm sbd configure show disk_metadata 
INFO: crm sbd configure show disk_metadata
==Dumping header on disk /dev/sda5
Header version     : 2.1
UUID               : eccc76d1-1930-4437-8cb3-2726ca0ac293
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 15
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 30
==Header on disk /dev/sda5 is dumped

# crm sbd configure show sysconfig |grep DEVICE
SBD_DEVICE=/dev/sda5

# crm sbd device add /dev/sda6
INFO: Configured sbd devices: /dev/sda5
INFO: Append devices: /dev/sda6
INFO: Configuring disk-based SBD
INFO: Initializing SBD device /dev/sda6
INFO: Update SBD_DEVICE in /etc/sysconfig/sbd: /dev/sda5;/dev/sda6
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: Resource is running, need to restart cluster service manually on each node

# crm sbd configure show disk_metadata 
INFO: crm sbd configure show disk_metadata
==Dumping header on disk /dev/sda5
Header version     : 2.1
UUID               : eccc76d1-1930-4437-8cb3-2726ca0ac293
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 15
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 30
==Header on disk /dev/sda5 is dumped

==Dumping header on disk /dev/sda6
Header version     : 2.1
UUID               : 50dbd68b-dcab-4280-b8c4-3af2070acfba
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 15
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 30
==Header on disk /dev/sda6 is dumped

# crm sbd configure show sysconfig |grep DEVICE
SBD_DEVICE="/dev/sda5;/dev/sda6"

6. Remove a sbd disk (UC2.3 in PED-8256)

# crm sbd configure show sysconfig |grep DEVICE
SBD_DEVICE="/dev/sda5;/dev/sda6"

# crm sbd device remove /dev/sda
/dev/sda5   /dev/sda6   

# crm sbd device remove /dev/sda6
INFO: Configured sbd devices: /dev/sda5;/dev/sda6
INFO: Remove devices: /dev/sda6
INFO: Update SBD_DEVICE in /etc/sysconfig/sbd: /dev/sda5
INFO: Already synced /etc/sysconfig/sbd to all nodes
INFO: Requires to restart cluster service to take effect

# crm sbd configure show sysconfig |grep DEVICE
SBD_DEVICE=/dev/sda5

7. Purge sbd from cluster

# crm sbd purge 
INFO: Stop sbd resource 'stonith-sbd'(stonith:fence_sbd)
INFO: Remove sbd resource 'stonith-sbd'
INFO: Disable sbd.service on node alp-1
INFO: Disable sbd.service on node alp-2
INFO: Move /etc/sysconfig/sbd to /etc/sysconfig/sbd.bak on all nodes
INFO: Delete cluster property "stonith-timeout" in crm_config
INFO: Delete cluster property "priority-fencing-delay" in crm_config
WARNING: "stonith-enabled" in crm_config is set to false, it was true
INFO: Restarting cluster service
INFO: BEGIN Waiting for cluster
..........                                                                                                                                                                                         
INFO: END Waiting for cluster

8. Replace the storage for a sbd disk (UC2.4 in PED-8256)

# crm sbd configure show disk_metadata 
INFO: crm sbd configure show disk_metadata
==Dumping header on disk /dev/sda5
Header version     : 2.1
UUID               : a14c7997-b4e8-4490-aec2-3cd0e5746126
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 15
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 30
==Header on disk /dev/sda5 is dumped

==Dumping header on disk /dev/sda6
Header version     : 2.1
UUID               : 9a7fd3fd-71ad-4222-8ff2-e171ed9b776c
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 15
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 30
==Header on disk /dev/sda6 is dumped

# crm sbd device /dev/sda6
ERROR: Invalid argument: /dev/sda6
INFO: Usage: crm sbd device <add|remove> <device>...

# crm sbd device remove /dev/sda6
INFO: Configured sbd devices: /dev/sda5;/dev/sda6
INFO: Remove devices: /dev/sda6
INFO: Update SBD_DEVICE in /etc/sysconfig/sbd: /dev/sda5
INFO: Already synced /etc/sysconfig/sbd to all nodes
INFO: Requires to restart cluster service to take effect

# crm sbd device add /dev/sda9
INFO: Configured sbd devices: /dev/sda5
INFO: Append devices: /dev/sda9
INFO: Configuring disk-based SBD
INFO: Initializing SBD device /dev/sda9
INFO: Update SBD_DEVICE in /etc/sysconfig/sbd: /dev/sda5;/dev/sda9
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: Resource is running, need to restart cluster service manually on each node

# crm sbd configure show disk_metadata 
INFO: crm sbd configure show disk_metadata
==Dumping header on disk /dev/sda5
Header version     : 2.1
UUID               : a14c7997-b4e8-4490-aec2-3cd0e5746126
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 15
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 30
==Header on disk /dev/sda5 is dumped

==Dumping header on disk /dev/sda9
Header version     : 2.1
UUID               : ad807485-70cd-4216-8a03-26197de0878a
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 15
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 30
==Header on disk /dev/sda9 is dumped

# crm cluster restart --all
INFO: The cluster stack stopped on alp-1
INFO: The cluster stack stopped on alp-2
INFO: The cluster stack started on alp-1
INFO: The cluster stack started on alp-2

# ps -ef|grep sbd
root        3578       1  0 13:53 ?        00:00:00 sbd: inquisitor
root        3579    3578  0 13:53 ?        00:00:00 sbd: watcher: /dev/sda5 - slot: 0 - uuid: a14c7997-b4e8-4490-aec2-3cd0e5746126
root        3580    3578  0 13:53 ?        00:00:00 sbd: watcher: /dev/sda9 - slot: 0 - uuid: ad807485-70cd-4216-8a03-26197de0878a
root        3581    3578  0 13:53 ?        00:00:00 sbd: watcher: Pacemaker
root        3582    3578  0 13:53 ?        00:00:00 sbd: watcher: Cluster

9. display status (focusing on the runtime information only) (UC5 in PED-8256)

# crm sbd status
# Type of SBD:
Disk-based SBD configured

# Status of sbd.service:
Node   |Active  |Enabled |Since
alp-1  |YES     |YES     |active since: Tue 2024-10-22 13:53:31 CST
alp-2  |YES     |YES     |active since: Tue 2024-10-22 13:53:31 CST

# Watchdog info:
Node   |Device          |Driver    |Kernel Timeout
alp-1  |/dev/watchdog0  |iTCO_wdt  |10
alp-2  |/dev/watchdog0  |iTCO_wdt  |10

# Status of fence_sbd:
resource stonith-sbd is running on: alp-1

10. overwrite case

Added device has the same metadata with configured devices

# crm sbd device add /dev/sda7
INFO: Configured sbd devices: /dev/sda5;/dev/sda6
/dev/sda7 has already been initialized by SBD - overwrite (y/n)? n
INFO: Append devices: /dev/sda7
INFO: Update SBD_DEVICE in /etc/sysconfig/sbd: /dev/sda5;/dev/sda6;/dev/sda7
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: Resource is running, need to restart cluster service manually on each node

Overwrite added device

# crm sbd device add /dev/sda7
INFO: Configured sbd devices: /dev/sda5;/dev/sda6
/dev/sda7 has already been initialized by SBD - overwrite (y/n)? y
INFO: Append devices: /dev/sda7
INFO: Configuring disk-based SBD
INFO: Initializing SBD device /dev/sda7
INFO: Update SBD_DEVICE in /etc/sysconfig/sbd: /dev/sda5;/dev/sda6;/dev/sda7
INFO: Already synced /etc/sysconfig/sbd to all nodes
INFO: Restarting cluster service
INFO: BEGIN Waiting for cluster
INFO: END Waiting for cluster

Added device has different metadata with configured devices

alp-1:~ # crm sbd device add /dev/sda7
INFO: Configured sbd devices: /dev/sda5;/dev/sda6
/dev/sda7 has already been initialized by SBD - overwrite (y/n)? n
WARNING: Device /dev/sda7 doesn't have the same metadata as /dev/sda5

Overwrite device via crm cluster init, interactive mode

# crm cluster init sbd
...
Do you wish to use SBD (y/n)? y
SBD_DEVICE in /etc/sysconfig/sbd is already configured to use '/dev/sda5;/dev/sda6' - overwrite (y/n)? y
Path to storage device (e.g. /dev/disk/by-id/...), or "none" for diskless sbd, use ";" as separator for multi path []/dev/sda6
/dev/sda6 has already been initialized by SBD - overwrite (y/n)? y
INFO: Configuring disk-based SBD
INFO: Initializing SBD device /dev/sda6
INFO: Update SBD_DEVICE in /etc/sysconfig/sbd: /dev/sda6
INFO: Update SBD_WATCHDOG_TIMEOUT in /etc/sysconfig/sbd: 15
INFO: Update SBD_WATCHDOG_DEV in /etc/sysconfig/sbd: /dev/watchdog0
INFO: Already synced /etc/sysconfig/sbd to all nodes
INFO: Enable sbd.service on node alp-1
INFO: Enable sbd.service on node alp-2
INFO: Restarting cluster service
...

Not overwrite, while with different metadata

# crm cluster init sbd
...
Do you wish to use SBD (y/n)? y
SBD_DEVICE in /etc/sysconfig/sbd is already configured to use '/dev/sda5;/dev/sda6' - overwrite (y/n)? y
Path to storage device (e.g. /dev/disk/by-id/...), or "none" for diskless sbd, use ";" as separator for multi path []/dev/sda6;/dev/sda7
/dev/sda6 has already been initialized by SBD - overwrite (y/n)? n
/dev/sda7 has already been initialized by SBD - overwrite (y/n)? n
WARNING: Device /dev/sda7 doesn't have the same metadata as /dev/sda6

Partly overwrite, use the first device's metadata

 # crm cluster init sbd
...
Do you wish to use SBD (y/n)? y
SBD_DEVICE in /etc/sysconfig/sbd is already configured to use '/dev/sda6;/dev/sda7' - overwrite (y/n)? y
Path to storage device (e.g. /dev/disk/by-id/...), or "none" for diskless sbd, use ";" as separator for multi path []/dev/sda5;/dev/sda7
/dev/sda5 has already been initialized by SBD - overwrite (y/n)? n
/dev/sda7 has already been initialized by SBD - overwrite (y/n)? y
INFO: Configuring disk-based SBD
INFO: Initializing SBD device /dev/sda7
INFO: Update SBD_DEVICE in /etc/sysconfig/sbd: /dev/sda5;/dev/sda7
INFO: Update SBD_WATCHDOG_DEV in /etc/sysconfig/sbd: /dev/watchdog0
INFO: Already synced /etc/sysconfig/sbd to all nodes
INFO: Enable sbd.service on node alp-1
INFO: Enable sbd.service on node alp-2
INFO: Restarting cluster service
INFO: BEGIN Waiting for cluster
INFO: END Waiting for cluster
WARNING: "stonith-enabled" in crm_config is set to true, it was false
INFO: Update SBD_DELAY_START in /etc/sysconfig/sbd: 71
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: "stonith-timeout" in crm_config is set to 83, it was 60s
cWARNING: "priority-fencing-delay" in crm_config is set to 60, it was 0
INFO: Done (log saved to /var/log/crmsh/crmsh.log on alp-1)

# crm sbd  configure show disk_metadata 
INFO: crm sbd configure show disk_metadata
==Dumping header on disk /dev/sda5
Header version     : 2.1
UUID               : a02ade38-169c-4c03-bb1b-8bade3126fe8
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 15
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 30
==Header on disk /dev/sda5 is dumped

==Dumping header on disk /dev/sda7
Header version     : 2.1
UUID               : a2ebd6a0-ab95-40d6-9132-104d96195fc7
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 15
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 30
==Header on disk /dev/sda7 is dumped

Not overwrite the sysconfig

# crm cluster init
...
Do you wish to use SBD (y/n)? y
SBD_DEVICE in /etc/sysconfig/sbd is already configured to use '/dev/sda5;/dev/sda7' - overwrite (y/n)? n
WARNING: Hawk not installed - not configuring web management interface.
INFO: BEGIN Waiting for cluster
............ 

# crm sbd status
# Type of SBD:
Disk-based SBD configured

# Status of sbd.service:
Node   |Active  |Enabled |Since
alp-1  |YES     |YES     |active since: Wed 2024-10-23 09:36:31 CST

# Watchdog info:
Node   |Device          |Driver    |Kernel Timeout
alp-1  |/dev/watchdog0  |iTCO_wdt  |10

# Status of fence_sbd:
resource stonith-sbd is running on: alp-1

Disk-less SBD scenarios

1. Show usage when syntax error (diskless)

# crm sbd configure xx
ERROR: Invalid argument: xx
Usage:
crm sbd configure show [sysconfig|property]
crm sbd configure [watchdog-timeout=<integer>] [watchdog-device=<device>]

2. completion (diskless)

# crm sbd configure 
show               watchdog-device=   watchdog-timeout=  
# crm sbd configure show 
property    sysconfig

3. Display SBD related configuration (UC4 in PED-8256, diskless)

# crm sbd configure show
INFO: crm sbd configure show sysconfig
SBD_PACEMAKER=yes
SBD_STARTMODE=always
SBD_DELAY_START=41
SBD_WATCHDOG_DEV=/dev/watchdog0
SBD_WATCHDOG_TIMEOUT=15
SBD_TIMEOUT_ACTION=flush,reboot
SBD_MOVE_TO_ROOT_CGROUP=auto
SBD_SYNC_RESOURCE_STARTUP=yes
SBD_OPTS=

INFO: crm sbd configure show property
have-watchdog=true
stonith-enabled=true
stonith-watchdog-timeout=-1
stonith-timeout=71

INFO: systemctl show -p TimeoutStartUSec sbd.service --value
TimeoutStartUSec=90

4. Manipulate the basic diskless sbd configuration (UC3.1 in PED-8256)

# crm sbd configure watchdog-timeout=31
INFO: Configuring diskless SBD
WARNING: Diskless SBD requires cluster with three or more nodes. If you want to use diskless SBD for 2-node cluster, should be combined with QDevice.
INFO: Update SBD_WATCHDOG_TIMEOUT in /etc/sysconfig/sbd: 31
INFO: Already synced /etc/sysconfig/sbd to all nodes
INFO: Restarting cluster service
INFO: BEGIN Waiting for cluster
...........                                                                                                       
INFO: END Waiting for cluster
INFO: Update SBD_DELAY_START in /etc/sysconfig/sbd: 73
INFO: Already synced /etc/sysconfig/sbd to all nodes
WARNING: "stonith-timeout" in crm_config is set to 85, it was 71

# crm sbd configure show
INFO: crm sbd configure show sysconfig
SBD_PACEMAKER=yes
SBD_STARTMODE=always
SBD_DELAY_START=73
SBD_WATCHDOG_DEV=/dev/watchdog0
SBD_WATCHDOG_TIMEOUT=31
SBD_TIMEOUT_ACTION=flush,reboot
SBD_MOVE_TO_ROOT_CGROUP=auto
SBD_SYNC_RESOURCE_STARTUP=yes
SBD_OPTS=

INFO: crm sbd configure show property
have-watchdog=true
stonith-enabled=true
stonith-watchdog-timeout=-1
stonith-timeout=85

INFO: systemctl show -p TimeoutStartUSec sbd.service --value
TimeoutStartUSec=90

5. Remove diskless sbd from cluster

# crm sbd disable 
INFO: Disable sbd.service on node alp-1
INFO: Disable sbd.service on node alp-2
INFO: Delete cluster property "stonith-watchdog-timeout" in crm_config
INFO: Delete cluster property "stonith-timeout" in crm_config
WARNING: "stonith-enabled" in crm_config is set to false, it was true
INFO: Requires to restart cluster service to take effect

# ps -ef|grep sbd
root        3418       1  0 08:43 ?        00:00:00 sbd: inquisitor
root        3420    3418  0 08:43 ?        00:00:00 sbd: watcher: Pacemaker
root        3421    3418  0 08:43 ?        00:00:00 sbd: watcher: Cluster
root        3665    1697  0 08:45 pts/0    00:00:00 grep --color=auto sbd

# crm cluster restart --all
INFO: The cluster stack stopped on alp-1
INFO: The cluster stack stopped on alp-2
INFO: The cluster stack started on alp-1
INFO: The cluster stack started on alp-2

# ps -ef|grep sbd
root        3752    1697  0 08:45 pts/0    00:00:00 grep --color=auto sbd

…rties under diskless sbd

After adding sbd device interface to manage devices, related functionalities inside sbd configure interface should be adjusted

and make sure the metadata is consistent between devices.

Add a log message to indicate the start of pacemaker.service. This helps users understand that the system is not hanging but is actually starting pacemaker, especially when SBD_DELAY_START is set and it takes longer to start pacemaker.

to avoid duplicate info message.

to redirect stderr to stdout.

And the `sbd purge` command will also move /etc/sysconfig/sbd to /etc/sysconfig/sbd.bak on all nodes.

doc/crm.8.adoc

crmsh/ui_sbd.py

liangxin1300 · 2024-11-29T02:15:53Z

Add output of sbd process in sbd status:

# crm sbd status
# Type of SBD:
Disk-based SBD configured

# Status of sbd.service:
Node   |Active  |Enabled |Since
alp-1  |YES     |YES     |active since: Fri 2024-11-29 14:46:19 CST
alp-2  |YES     |YES     |active since: Fri 2024-11-29 14:46:19 CST

# Status of sbd process on alp-1:
├─10675 sbd: watcher: /dev/sda5 - slot: 1 - uuid: 8c0e6bbd-d067-4d7e-9531-237da2490799
├─10676 sbd: watcher: /dev/sda6 - slot: 0 - uuid: 0a48b7a3-f9cc-46e6-89a4-e1f1215bdd14
├─10677 sbd: watcher: /dev/sda7 - slot: 0 - uuid: 31776962-477d-40ce-af72-f49a9f6f5dd4

# Status of sbd process on alp-2:
├─9128 sbd: watcher: /dev/sda5 - slot: 0 - uuid: 8c0e6bbd-d067-4d7e-9531-237da2490799
├─9129 sbd: watcher: /dev/sda6 - slot: 1 - uuid: 0a48b7a3-f9cc-46e6-89a4-e1f1215bdd14
├─9130 sbd: watcher: /dev/sda7 - slot: 1 - uuid: 31776962-477d-40ce-af72-f49a9f6f5dd4

# Watchdog info:
Node   |Device          |Driver    |Kernel Timeout
alp-1  |/dev/watchdog0  |iTCO_wdt  |10
alp-2  |/dev/watchdog0  |iTCO_wdt  |10

# Status of fence_sbd:
resource stonith-sbd is running on: alp-1

- Return immediately if no changes are made - Adjust watchdog timeout and msgwait values properly

crmsh/ui_sbd.py

doc/crm.8.adoc

zzhou1 · 2024-11-29T06:19:00Z

doc/crm.8.adoc

+...............
+# For disk-based SBD
+crm sbd configure show [disk_metadata|sysconfig|property]
+crm sbd configure [device=<dev>]... [watchdog-device=<dev>] [watchdog-timeout=<integer>] [allocate-timeout=<integer>] [loop-timeout=<integer>] [msgwait-timeout=<integer>]


And I am confused by crm sbd configure watchdog-timeout=.... There are 3 similar items: Timeout (watchdog) :, SBD_WATCHDOG_TIMEOUT= and stonith-watchdog-timeout=. Which ones are expected be modified by this command?

Two major scenarios:

disk-based
Timeout (watchdog) : in the disk metadata is used. SBD_WATCHDOG_TIMEOUT= is useless

diskless
SBD_WATCHDOG_TIMEOUT= and stonith-watchdog-timeout= are meant to be used by diskless-sbd only

zzhou1 · 2024-11-29T06:20:13Z

doc/crm.8.adoc

+...............
+# For disk-based SBD
+crm sbd configure show [disk_metadata|sysconfig|property]
+crm sbd configure [device=<dev>]... [watchdog-device=<dev>] [watchdog-timeout=<integer>] [allocate-timeout=<integer>] [loop-timeout=<integer>] [msgwait-timeout=<integer>]


If Timeout (watchdog) : and SBD_WATCHDOG_TIMEOUT= controls the same thing, and only one of them is effective, we should show only the effective one in crm sbd configure show, or indicate which one is effective in some way.

I'm kind of agree with you. Let's keep debating with Xin ;)

zzhou1 · 2024-11-29T07:16:25Z

crmsh/ui_sbd.py

+        # To keep the order of devices during removal
+        left_device_list = [dev for dev in self.device_list_from_config if dev not in devices_to_remove]
+        if len(left_device_list) == 0:
+            raise self.SyntaxError("Not allowed to remove all devices")


Suggested change

raise self.SyntaxError("Not allowed to remove all devices")

raise self.SyntaxError("Not allowed to remove all devices. Run `crm cluster init sbd -S` to bootstrap the diskless-sbd")

Intentionally, not give "-F" directly here, to have user think this twice. We can debate this.

zzhou1 · 2024-11-29T13:35:15Z

crmsh/ui_sbd.py

+            return False
+        if not sbd.SBDUtils.is_using_disk_based_sbd():
+            logger.error("Only works for disk-based SBD")
+            logger.info("Please use 'crm cluster init -s <dev1> [-s <dev2> [-s <dev3>]]' to configure disk-based SBD first")


Probably better to suggest using the SBD stage

Suggested change

logger.info("Please use 'crm cluster init -s <dev1> [-s <dev2> [-s <dev3>]]' to configure disk-based SBD first")

logger.info("Please use 'crm cluster init sbd -s <dev1> [-s <dev2>]' to configure the disk-based SBD first")

zzhou1 · 2024-11-29T14:12:47Z

crmsh/ui_sbd.py

+        for node in self.cluster_nodes:
+            out = self.cluster_shell.get_stdout_or_raise_error(scripts_in_shell, node)
+            if out:
+                print(f"# Status of sbd process on {node}:")


Suggested change

print(f"# Status of sbd process on {node}:")

print(f"# Status of the sbd disk watcher process on {node}:")

And, this information should no be printed out for the diskless-sbd.

zzhou1 · 2024-11-29T14:20:02Z

crmsh/ui_sbd.py

+        )
+        sbd_manager.init_and_deploy_sbd()
+
+    def _configure_diskless(self, parameter_dict: dict):


Same as the disk-based sbd, I expect to see TimeoutStartSec get updated along with crm sbd configure watchdog-timeout=50, for example.

Sounds like, the diskless bootstrap code doesn't do this too.

liangxin1300 force-pushed the 20240614_crm_sbd_sublevel branch 2 times, most recently from 338ed50 to 2f10c6e Compare July 17, 2024 13:52

liangxin1300 force-pushed the 20240614_crm_sbd_sublevel branch 9 times, most recently from a19a863 to cc0d52a Compare July 23, 2024 13:34

liangxin1300 force-pushed the 20240614_crm_sbd_sublevel branch 9 times, most recently from 1456931 to e8f53af Compare August 1, 2024 03:16

liangxin1300 force-pushed the 20240614_crm_sbd_sublevel branch 2 times, most recently from bc2a1fa to 229de46 Compare August 2, 2024 02:50

liangxin1300 force-pushed the 20240614_crm_sbd_sublevel branch 6 times, most recently from 77c1c4f to 5d17668 Compare August 20, 2024 02:07

liangxin1300 force-pushed the 20240614_crm_sbd_sublevel branch from 5d17668 to ce84f84 Compare August 20, 2024 14:45

liangxin1300 added 11 commits November 25, 2024 17:19

Dev: ui_sbd: No need to specify device="" when trying to modify prope…

6b5d7eb

…rties under diskless sbd

Dev: ui_sbd: Add sbd device sub command

84a2db2

Dev: ui_sbd: Replace sbd remove as sbd disable sub-command

b8395cb

Dev: ui_sbd: Adjust sbd confiure interface

a6b1307

After adding sbd device interface to manage devices, related functionalities inside sbd configure interface should be adjusted

Dev: ui_sbd: Check if the adding device is already initialized

9c0e728

and make sure the metadata is consistent between devices.

Dev: ui_sbd: Reuse sbd.SBDManager.restart_cluster_if_possible

d3338f3

to avoid duplicate info message.

Dev: ui_sbd: Check if node is reachable when getting the node list

3c6061c

Dev: sbd: Move constants.SHOW_SBD_START_TIMEOUT_CMD to sbd.py

2ca9e58

Dev: sh: Add get_rc_output_without_input in ClusterShell

b544027

to redirect stderr to stdout.

Dev: ui_sbd: Replace 'sbd disable' as 'sbd purge'

d46228d

And the `sbd purge` command will also move /etc/sysconfig/sbd to /etc/sysconfig/sbd.bak on all nodes.

liangxin1300 force-pushed the 20240614_crm_sbd_sublevel branch 2 times, most recently from 774ea69 to 79535f6 Compare November 25, 2024 10:10

zzhou1 reviewed Nov 28, 2024

View reviewed changes

doc/crm.8.adoc Outdated Show resolved Hide resolved

zzhou1 reviewed Nov 28, 2024

View reviewed changes

crmsh/ui_sbd.py Show resolved Hide resolved

liangxin1300 added 3 commits November 29, 2024 09:39

Dev: doc: Upadate crm.8.adoc for SBD help text

56b3a1f

Dev: behave: Add sbd_ui.feature to test the crm sbd UI

c5584b4

Dev: sbd: Split get_sbd_device_interactive into smaller functions

25997ce

liangxin1300 force-pushed the 20240614_crm_sbd_sublevel branch from 79535f6 to 2cbfc71 Compare November 29, 2024 02:14

liangxin1300 added 2 commits November 29, 2024 14:41

Dev: ui_sbd: Print sbd cmdline content in sbd status command

4cfe879

Dev: ui_sbd: Adjust sbd configure subcommand

772494e

- Return immediately if no changes are made - Adjust watchdog timeout and msgwait values properly

liangxin1300 force-pushed the 20240614_crm_sbd_sublevel branch 2 times, most recently from 5a33305 to b9e9853 Compare November 29, 2024 07:08

zzhou1 reviewed Nov 29, 2024

View reviewed changes

liangxin1300 force-pushed the 20240614_crm_sbd_sublevel branch from b9e9853 to 1dedfe3 Compare November 29, 2024 13:55

liangxin1300 added 2 commits November 29, 2024 22:04

Dev: ui_sbd: Adjust output of sbd status

9de6c4c

Dev: unittests: Adjust unit test for previous commits

a9636b9

liangxin1300 force-pushed the 20240614_crm_sbd_sublevel branch from 1dedfe3 to a9636b9 Compare November 29, 2024 14:07

zzhou1 reviewed Nov 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'crm sbd' sub-level (jsc#PED-8256) #1491

Add 'crm sbd' sub-level (jsc#PED-8256) #1491

liangxin1300 commented Jul 17, 2024 •

edited

Loading

liangxin1300 commented Jul 18, 2024 •

edited

Loading

liangxin1300 commented Nov 29, 2024 •

edited

Loading

zzhou1 Nov 29, 2024

zzhou1 Nov 29, 2024

zzhou1 Nov 29, 2024

zzhou1 Nov 29, 2024

zzhou1 Nov 29, 2024

zzhou1 Nov 29, 2024

	raise self.SyntaxError("Not allowed to remove all devices")
	raise self.SyntaxError("Not allowed to remove all devices. Run `crm cluster init sbd -S` to bootstrap the diskless-sbd")

	logger.info("Please use 'crm cluster init -s <dev1> [-s <dev2> [-s <dev3>]]' to configure disk-based SBD first")
	logger.info("Please use 'crm cluster init sbd -s <dev1> [-s <dev2>]' to configure the disk-based SBD first")

	print(f"# Status of sbd process on {node}:")
	print(f"# Status of the sbd disk watcher process on {node}:")

Add 'crm sbd' sub-level (jsc#PED-8256) #1491

Are you sure you want to change the base?

Add 'crm sbd' sub-level (jsc#PED-8256) #1491

Conversation

liangxin1300 commented Jul 17, 2024 • edited Loading

Motivation

Changed include

Disk-based SBD scenarios

Disk-less SBD scenarios

liangxin1300 commented Jul 18, 2024 • edited Loading

Disk-based SBD scenarios

1. Show usage when syntax error

2. Completion

3. Display SBD related configuration (UC4 in PED-8256)

4. Change the on-disk meta data of the existing sbd disks (UC2.1 in PED-8256)

5. Add a sbd disk with the existing sbd configuration (UC2.2 in PED-8256)

6. Remove a sbd disk (UC2.3 in PED-8256)

7. Purge sbd from cluster

8. Replace the storage for a sbd disk (UC2.4 in PED-8256)

9. display status (focusing on the runtime information only) (UC5 in PED-8256)

10. overwrite case

Disk-less SBD scenarios

1. Show usage when syntax error (diskless)

2. completion (diskless)

3. Display SBD related configuration (UC4 in PED-8256, diskless)

4. Manipulate the basic diskless sbd configuration (UC3.1 in PED-8256)

5. Remove diskless sbd from cluster

liangxin1300 commented Nov 29, 2024 • edited Loading

zzhou1 Nov 29, 2024

Choose a reason for hiding this comment

zzhou1 Nov 29, 2024

Choose a reason for hiding this comment

zzhou1 Nov 29, 2024

Choose a reason for hiding this comment

zzhou1 Nov 29, 2024

Choose a reason for hiding this comment

zzhou1 Nov 29, 2024

Choose a reason for hiding this comment

zzhou1 Nov 29, 2024

Choose a reason for hiding this comment

liangxin1300 commented Jul 17, 2024 •

edited

Loading

liangxin1300 commented Jul 18, 2024 •

edited

Loading

liangxin1300 commented Nov 29, 2024 •

edited

Loading